1,071 research outputs found

    PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning

    Full text link
    The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration. This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration. For instance, as demonstrated in our empirical study, state-of-the-art RL algorithms such as DDPG and TD3 are unable to steer a point mass in even small 2D mazes. In this paper, we propose a new algorithm called "Plan, Backplay, Chain Skills" (PBCS) that combines motion planning and reinforcement learning to solve hard exploration environments. In a first phase, a motion planning algorithm is used to find a single good trajectory, then an RL algorithm is trained using a curriculum derived from the trajectory, by combining a variant of the Backplay algorithm and skill chaining. We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase

    The problem with DDPG: understanding failures in deterministic environments with sparse rewards

    Full text link
    In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood. In this paper, we contribute a formal explanation of these failures in the particular case of sparse reward and deterministic environments. First, using a very elementary control problem, we illustrate that the learning process can get stuck into a fixed point corresponding to a poor solution. Then, generalizing from the studied example, we provide a detailed analysis of the underlying mechanisms which results in a new understanding of one of the convergence regimes of these algorithms. The resulting perspective casts a new light on already existing solutions to the issues we have highlighted, and suggests other potential approaches.Comment: 19 pages, submitted to ICLR 202

    Optimization Techniques for Energy Minimization Problem in a Marked Point Process Application to Forestry

    Get PDF
    We use marked point processes to detect an unknown number of trees from high resolution aerial images. This approach turns to be an energy minimization problem, where the energy contains a prior term which takes into account the geometrical properties of the objects, and a data term to match these objects onto the image. This stochastic process is simulated via a Reversible Jump Markov Chain Monte Carlo procedure, which embeds a Simulated Annealing scheme to extract the best configuration of objects. We compare in this paper different cooling schedules of the Simulated Annealing algorithm which could provide some good minimization in a short time. We also study some adaptive proposition kernels

    Absolute palaeointensity of Oligocene (28-30 Ma) lava flows from the Kerguelen Archipelago (southern Indian Ocean).

    No full text
    We report palaeointensity estimates obtained from three Oligocene volcanic sections from the Kerguelen Archipelago (Mont des Ruches, Mont des Tempêtes, and Mont Rabouillère). Of 402 available samples, 102 were suitable for a palaeofield strength determination after a preliminary selection, among which 49 provide a reliable estimate. Application of strict a posteriori criteria make us confident about the quality of the 12 new mean-flow determinations, which are the first reliable data available for the Kerguelen Archipelago. The Virtual Dipole Moments (VDM) calculated for these flows vary from 2.78 to 9.47 10e22 Am2 with an arithmetic mean value of 6.15+-2.1 10e22 Am2. Compilation of these results with a selection of the 2002 updated IAGA palaeointensity database lead to a higher (5.4+-2.3 10e22 Am2) Oligocene mean VDM than previously reported, identical to the 5.5+-2.4 10e22 Am2 mean VDM obtained for the 0.3-5 Ma time window. However, these Kerguelen palaeointensity estimates represent half of the reliable Oligocene determinations and thus a bias toward higher values. Nonetheless, the new estimates reported here strengthen the conclusion that the recent geomagnetic field strength is anomalously high compared to that older than 0.3 Ma

    Karhunen-Loeve expansion revisited for vector-valued random fields: scaling, errors and optimal basis

    Get PDF
    International audienceDue to scaling effects, when dealing with vector-valued random fields, the classical Karhunen-Loève expansion, which is optimal with respect to the total mean square error, tends to favorize the components of the random field that have the highest signal energy. When these random fields are to be used in mechanical systems, this phenomenon can introduce undesired biases for the results. This paper presents therefore an adaptation of the Karhunen- Loève expansion that allows us to control these biases and to minimize them. This original decomposition is first analyzed from a theoretical point of view, and is then illustrated on a numerical example

    Probabilistic simulation for the certification of railway vehicles

    Get PDF
    The present dynamic certification process that is based on experiments has been essentially built on the basis of experience. The introduction of simulation techniques into this process would be of great interest. However, an accurate simulation of complex, nonlinear systems is a difficult task, in particular when rare events (for example, unstable behaviour) are considered. After analysing the system and the currently utilized procedure, this paper proposes a method to achieve, in some particular cases, a simulation-based certification. It focuses on the need for precise and representative excitations (running conditions) and on their variable nature. A probabilistic approach is therefore proposed and illustrated using an example. First, this paper presents a short description of the vehicle / track system and of the experimental procedure. The proposed simulation process is then described. The requirement to analyse a set of running conditions that is at least as large as the one tested experimentally is explained. In the third section, a sensitivity analysis to determine the most influential parameters of the system is reported. Finally, the proposed method is summarized and an application is presented
    • …
    corecore